--- name: literature-experiment-extract description: Extract experimental models, experimental methods, and biomarker information from paper Markdown (typically produced by PDF-to-Markdown tools) when a user provides paper Markdown and needs a structured, evidence-backed summary (1 Markdown + 3 CSVs). license: MIT author: aipoch --- > **Source**: [https://github.com/aipoch/medical-research-skills](https://github.com/aipoch/medical-research-skills) ## When to Use - You have a paper converted to Markdown (e.g., via PDF-to-Markdown) and need to extract **cell/animal models** used in experiments. - You need a structured list of **experimental methods/protocols** described in the paper, with traceable evidence. - You want to compile **biomarkers / detection indicators** (e.g., genes, proteins, assays, readouts) reported in the study. - You need standardized outputs for downstream analysis: **one Markdown summary plus three CSV tables**. - The paper Markdown includes page markers (e.g., `## Page XX`) and you want evidence organized **by page**. ## Key Features - Extracts three entity groups from paper Markdown: - **Experimental models** (cell lines, animal models, strains, genotypes, etc.) - **Experimental methods** (assays, protocols, instruments, conditions) - **Biomarkers / indicators** (targets, readouts, measured variables) - Produces **evidence-backed** results (citations/excerpts preserved and traceable to the source). - Supports **page-aware evidence organization** when the input includes pagination headers like `## Page XX`. - Outputs are fixed and standardized: - **1 Markdown summary** - **3 CSV files**: models / methods / biomarkers - Uses a predefined template and extraction rules: - Requirements and consistency rules: `references/guide.md` - Output template: `assets/template.md` ## Dependencies - None (documentation-driven workflow). - Input assumption: paper content is available as **Markdown**, typically generated by a **PDF-to-Markdown** tool. ## Example Usage ### Input A paper converted to Markdown, ideally with page headers: ```md ## Page 1 ... text describing "C57BL/6 mice" and "Western blot" ... ## Page 2 ... text describing "ELISA" and "IL-6 levels" ... ``` ### Steps 1. Open the paper Markdown (typically produced by PDF-to-Markdown tools). 2. Extract **models**, **methods**, and **biomarkers** page by page. 3. Follow: - Extraction rules and evidence requirements: `references/guide.md` - Output template: `assets/template.md` 4. Output **exactly**: - `outputs/{Paper Abbreviation}-experiment-summary.md` - `outputs/{Paper Abbreviation}-models.csv` - `outputs/{Paper Abbreviation}-methods.csv` - `outputs/{Paper Abbreviation}-biomarkers.csv` ### Output (required) - All final outputs must be **UTF-8** encoded. - Output must be produced **directly** (no confirmation steps or optional branches). - Evidence excerpts must remain in the **original language** of the source literature. ## Implementation Details - **Input parsing** - Read the paper Markdown as the sole input source. - If pagination headers like `## Page XX` exist, prioritize attaching evidence to the corresponding page. - **Extraction rules** - Apply entity definitions, allowed/expected fields, normalization rules, and evidence formatting as specified in `references/guide.md`. - **Output formatting** - Generate outputs using `assets/template.md` as the canonical structure. - Add rows as needed while preserving evidence citations/excerpts. - The output set is fixed: **1 Markdown summary + 3 CSVs** (models/methods/biomarkers). - **Paths and naming** - Default output directory: `outputs/` - Naming: - Markdown: `outputs/{Paper Abbreviation}-experiment-summary.md` - CSVs: - `outputs/{Paper Abbreviation}-models.csv` - `outputs/{Paper Abbreviation}-methods.csv` - `outputs/{Paper Abbreviation}-biomarkers.csv` - **Language** - Output language should be **Chinese by default** (or the user-requested language if specified). - Evidence excerpts must remain in the **original language** of the source text.